Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
نویسندگان
چکیده
We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows data parallelism approach uses partitioning methods to distribute workload in FV primitives evenly across available GPUs. The is put address space runtime requirements FHE computations. It also suitable for distributed-memory architectures, includes efficient GPU-to-GPU exchange protocols. Moreover, it user-friendly as user intervention not required task decomposition, scheduling or load balancing. implement evaluate our on two homogeneous heterogeneous NVIDIA GPU clusters: K80, customized P100. provide comparison with recent shared-memory-based multi-core CPU using homomorphic circuits workloads: vector addition multiplication. we use Levelled-FHE inference circuit Convolutional Neural Networks (CNNs) perform homomorphically image classification encrypted images from MNIST CIFAR - 10 datasets. provides 1 3 orders magnitude speedup compared operations. In terms scalability, shows reasonable scalability curves when GPUs are fully connected.
منابع مشابه
Load-Balanced Isosurfacing on Multi-GPU Clusters
Isosurface extraction is a common technique applied in scientific visualization. Increasing sizes of volumes over which isosurfacing is to be applied combined with increasingly hierarchical parallel architectures present challenges for efficiently distributing isosurfacing work loads. We propose a technique that, with a modest amount of preprocessing, efficiently distributes isosurfacing load t...
متن کاملParallel Rendering on Hybrid Multi-GPU Clusters
Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embr...
متن کاملParallel Sorting on GPU Clusters
It is becoming more common to install modern graphics cards on small to medium size commodity clusters. In addition to applications such as display walls and CAVE environments, graphics cards can be used as dedicated coprocessors that can run certain parallel algorithms very quickly. Sorting has been long recognized as an important algorithm in terms of both mathematical analysis and a way to j...
متن کاملMulti-level parallelism for incompressible flow computations on GPU clusters
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...
متن کاملTuning And Understanding MILC Performance In Cray XK6 GPU Clusters
Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power efficiency, and low cost. Lattice QCD is one of the fields that has successfully adopted GPUs and scaled to hundreds of them. In this paper, we report our Cray XK6 experience in profiling and understanding performance for MILC, one of the Lattice QCD computat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2021
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2020.3021238